Discovery of unconventional patterns for sequence analysis: theory and algorithms

نویسندگان

  • Giovanni Battaglia
  • Roberto Grossi
  • Nadia Pisanti
  • Roberto Marangoni
  • Giulia Menconi
چکیده

The pattern discovery task (or equivalently motif inference) is the knowledge discovery process that, given a dataset and some constrains either on the combinatorial pattern structure or on the occurrence lists, returns all the patterns satisfying the given constraints. In this thesis, we consider the problem of discovering patterns in sequential data, such as texts, biological sequences, access logs, etc. When it comes to defining what is a pattern, several classes have been proposed in literature. For example, rigid patterns like p = c◦tc where the don’t care symbol ◦ matches any single character of the input alphabet Σ, or gapped patterns like p = ctt− 2, 3− tc where the gap represents either a sequence of 2 or 3 don’t cares (we refer to [154] for a thorough discussion of the above classes of patterns). The adjective “unconventional” in the title of this thesis is referred to the unusual combinatorial structure of the patterns we are going to investigate. In fact, while the classic literature of this field focus on string patterns (maybe with wildcards), our line of research explores three different kind of patterns: mask patterns, where each pattern represents a set of string patterns with wildcards, permutation patterns where each pattern is a multiset of characters, and the order of the contained symbols doesn’t matter, and transposons which, roughly speaking, represent the non-conserved regions of a global alignment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer ...

متن کامل

تحلیل تراکنش‌های امانت و گردش منابع کتابخانه‌های دانشگاه علوم پزشکی بیرجند با الگوریتم‌های داده‌کاوی

Introduction: Data mining is a process for discovering meaningful relationships and patterns from data. Identify behavior patterns of libraries users can helps improve decision-making in libraries. This study aimed to analyze the interlibrary loan transactions in Birjand University of Medical Sciences using data mining algorithms. Methods: In this descriptive study, knowledge discovery and d...

متن کامل

Identification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms

In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...

متن کامل

A Framework for Exploring the Frequent Patterns based on Activities Sequence

In recent years, the development of the use of location-based tools has made it possible to produce geometric trajectories from the user's movement paths. In this way, users' goal of traveling and related activities can be considered in addition to the geometry and route shape. the user activity trajectory represents the sequence of the visited activities and its related analysis as presented i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011